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Abstract 

Color constancy is the recovery of true surface color from observed color, and 
requires estimating the chromaticity of scene illumination to correct for the bias 
it induces. In this paper, we show that the per-pixel color statistics of natural 
scenes—without any spatial or semantic context—can by themselves be a pow¬ 
erful cue for color constancy. Specifically, we describe an illuminant estimation 
method that is built around a “classifier” for identifying the true chromaticity of 
a pixel given its luminance (absolute brightness across color channels). During 
inference, each pixel’s observed color restricts its true chromaticity to those val¬ 
ues that can be explained by one of a candidate set of illuminants, and applying 
the classifier over these values yields a distribution over the corresponding illumi¬ 
nants. A global estimate for the scene illuminant is computed through a simple 
aggregation of these distributions across all pixels. We begin by simply defin¬ 
ing the luminance-to-chromaticity classifier by computing empirical histograms 
over discretized chromaticity and luminance values from a training set of natural 
images. These histograms reflect a preference for hues corresponding to smooth 
reflectance functions, and for achromatic colors in brighter pixels. Despite its 
simplicity, the resulting estimation algorithm outperforms current state-of-the-art 
color constancy methods. Next, we propose a method to learn the luminance- 
to-chromaticity classifier “end-to-end”. Using stochastic gradient descent, we set 
chromaticity-luminance likelihoods to minimize errors in the final scene illumi¬ 
nant estimates on a training set. This leads to further improvements in accuracy, 
most significantly in the tail of the error distribution. 


1 Introduction 

The spectral distribution of light reflected off a surface is a function of an intrinsic material property 
of the surface—its refiectance—and also of the spectral distribution of the light illuminating the 
surface. Consequently, the observed color of the same surface under different illuminants in different 
images will be different. To be able to reliably use color computationally for identifying materials 
and objects, researchers are interested in deriving an encoding of color from an observed image that 
is invariant to changing illumination. This task is known as color constancy, and requires resolving 
the ambiguity between illuminant and surface colors in an observed image. Since both of these 
quantities are unknown, much of color constancy research is focused on identifying models and 
statistical properties of natural scenes that are informative for color constancy. While pschophysical 
experiments have demonstrated that the human visual system is remarkably successful at achieving 
color constancy 11], it remains a challenging task computationally. 

Early color constancy algorithms were based on relatively simple models for pixel colors. For 
example, the gray world method 12] simply assumed that the average true intensities of different 
color channels across all pixels in an image would be equal, while the white-patch retinex method 13] 
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assumed that the true color of the brightest pixels in an image is white. Most modem color constancy 
methods, however, are based on more complex reasoning with higher-order image features. Many 
methods [4, 5, 6] use models for image derivatives instead of individual pixels. Others are based on 
recognizing and matching image segments to those in a training set to recover true color [ 7 ]. A recent 
method proposes the use of a multi-layer convolutional neural network (CNN) to regress from image 
patches to illuminant color. There are also many “combination-based” color constancy algorithms, 
that combine illuminant estimates from a number of simpler “unitary” algorithms [8, 9, 10, 11], 
sometimes using image features to give higher weight to the outputs of some subset of methods. 

In this paper, we demonstrate that by appropriately modeling and reasoning with the statistics of 
individual pixel colors, one can computationally recover illuminant color with high accuracy. We 
consider individual pixels in isolation, where the color constancy task reduces to discriminating be¬ 
tween the possible choices of true color for the pixel that are feasible given the observed color and a 
candidate set of illuminants. Central to our method is a function that gives us the relative likelihoods 
of these true colors, and therefore a distribution over the corresponding candidate illuminants. Our 
global estimate for the scene illuminant is then computed by simply aggregating these distributions 
across all pixels in the image. 

We formulate the likelihood function as one that measures the conditional likelihood of true pixel 
chromaticity given observed luminance, in part to be agnostic to the scalar (i.e., color channel- 
independent) ambiguity in observed color intensities. Moreover, rather than committing to a para¬ 
metric form, we quantize the space of possible chromaticity and luminance values, and define the 
function over this discrete domain. We begin by setting the conditional likelihoods purely empir¬ 
ically, based simply on the histograms of true color values over all pixels in all images across a 
training set. Even with this purely empirical approach, our estimation algorithm yields estimates 
with higher accuracy than current state-of-the-art methods. Then, we investigate learning the per- 
pixel belief function by optimizing an objective based on the accuracy of the final global illuminant 
estimate. We carry out this optimization using stochastic gradient descent, and using a sub-sampling 
approach (similar to “dropout” [12]) to improve generalization beyond the training set. This further 
improves estimation accuracy, without adding to the computational cost of inference. 

2 Preliminaries 

Assuming Lambertian reflection, the spectral distribution of light reflected by a material is a product 
of the distribution of the incident light and the material’s reflectance function. The color intensity 
vector v(n) G recorded by a tri-chromatic sensor at each pixel n is then given by 

v(n) = J K,{n, X)£{n^ X) s{n) Tl{X) dX^ (1) 

where K,{n^X) is the reflectance at n, ^(n, A) is the spectral distribution of the incident illumination, 
s{n) is a geometry-dependent shading factor, and 11 (A) G denotes the spectral sensitivities of 
the color sensors. Color constancy is typically framed as the task of computing from v(n) the 
corresponding color intensities x(n) G that would have been observed under some canonical 
illuminant 4ef (typically chosen to be 4ef(A) = 1). We will refer to x(n) as the “true color” at n. 

Since (1) involves a projection of the full incident light spectrum on to the three filters n(A), it is not 
generally possible to recover x(n) from v(n) even with knowledge of the illuminant ^(n, A). How¬ 
ever, a commonly adopted approximation (shown to be reasonable under certain assumptions [13]) 
is to relate the true and observed colors x(n) and v(n) by a simple per-channel adaptation: 

v(n) = m(n) o x(n), (2) 

where o refers to the element-wise Hadamard product, and m(n) G depends on the illuminant 
^(n, A) (for 4ef, m = [1? 1]^)- With some abuse of terminology, we will refer to m(n) as the 

illuminant in the remainder of the paper. Moreover, we will focus on the single-illuminant case in 
this paper, and assume m(n) = m, Vn in an image. Our goal during inference will be to estimate 
this global illuminant m from the observed image v(n). The true color image x(n) can then simply 
be recovered as o v(n), where G denotes the element-wise inverse of m. 

Note that color constancy algorithms seek to resolve the ambiguity between m and x(n) in (2) only 
up to a channel-independent scalar factor. This is because scalar ambiguities show up in m between 
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I and 4ef due to light attenuation, between x(n) and i^{n) due to the shading factor s(n), and in 
the observed image v(n) itself due to varying exposure settings. Therefore, the performance metric 

typically used is the angular error cos“^ ( || 111 ^ 211 ^II 2 ) t>etween the true and estimated ihuminant 
vectors m and m. 

Database For training and evaluation, we use the database of 568 natural indoor and outdoor 
images captured under various ihuminants by Gehler et al. [14]. We use the version from Shi and 
Funt [15] that contains linear images (without gamma correction) generated from the RAW camera 
data. The database contains images captured with two different cameras (86 images with a Canon 
ID, and 482 with a Canon 5D). Each image contains a color checker chart placed in the image, with 
its position manually labeled. The colors of the gray squares in the chart are taken to be the value 
of the true ihuminant m for each image, which can then be used to correct the image to get true 
colors at each pixel (of course, only up to scale). The chart is masked out during evaluation. We use 
k-fold cross-validation over this dataset in our experiments. Each fold contains images from both 
cameras corresponding to one of k roughly-equal partitions of each camera’s image set (ordered 
by hie name/order of capture). Estimates for images in each fold are based on training only with 
data from the remaining folds. We report results with three- and ten-fold cross-validation. These 
correspond to average training set sizes of 379 and 511 images respectively. 


3 Color Constancy with Pixel-wise Chromaticity Statistics 


A color vector x G can be characterized in terms of (1) its luminance ||x|| 1 , or absolute brightness 
across color channels; and (2) its chromaticity, which is a measure of the relative ratios between 
intensities in different channels. While there are different ways of encoding chromaticity, we will 
do so in terms of the unit vector x = x/||x ||2 in the direction of x. Note that since intensities can 
not be negative, x is restricted to he on the non-negative eighth of the unit sphere . Remember 
from Sec. 2 that our goal is to resolve the ambiguity between the true colors x(n) and the ihuminant 
m only up to scale. In other words, we need only estimate the ihuminant chromaticity ih and true 
chromaticities x(n) from the observed image v(n), which we can relate from (2) as 


x(n) 


x(n) 

l|x(n )||2 


ih ^ o v(n) 
||m-i o v(n )||2 


A 


^(v(n),m). 


(3) 


A key property of natural ihuminant chromaticities is that they are known to take a fairly restricted 
set of values, close to a one-dimensional locus predicted by Planck’s radiation law [16]. To be able 
to exploit this, we denote Ai = as the set of possible values for ihuminant chromaticity ih, 

and construct it from a training set. Specihcahy, we quantize^ the chromaticity vectors of 

the ihuminants in the training set, and let M. be the set of unique chromaticity values. Additionally, 
we dehne a “prior” bi = log{ni/T) over this candidate set, based on the number rii of training 
ihuminants that were quantized to ihi. 

Civen the observed color v(n) at a single pixel n, the ambiguity in ih across the ihuminant set M. 
translates to a corresponding ambiguity in the true chromaticity x(n) over the set {^(v(n), mi)}i. 
Figure 1(a) illustrates this ambiguity for a few different observed colors v. We note that while there 
is signihcant angular deviation within the set of possible true chromaticity values for any observed 
color, values in each set he close to a one dimensional locus in chromaticity space. This suggests 
that the ihuminants in our training set are indeed a good ht to Planck’s law^. 

The goal of our work is to investigate the extent to which we can resolve the above ambiguity 
in true chromaticity on a per-pixel basis, without having to reason about the pixel’s spatial neigh¬ 
borhood or semantic context. Our approach is based on computing a likelihood distribution over 
the possible values of x(n), given the observed luminance llvHlli- But as mentioned in Sec. 2, 
there is considerable ambiguity in the scale of observed color intensities. We address this par¬ 
tially by applying a simple per-image global normalization to the observed luminance to dehne 

^Quantization is over uniformly sized bins in See supplementary material for details. 

^In fact, the chromaticities appear to lie on two curves, that are slightly separated from each other. This 
separation is likely due to differences in the sensor responses of the two cameras in the Gehler-Shi dataset. 
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(a) Ambiguity with Observed Color 

Set of possible true 
chromaticities {5f(v, ihi)}, 
for a specific observed 
color V. 



(b)L[x,y] from Empirical Statistics 
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(c)L[k,y] from End-to-end Learning 
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Figure 1: Color Constancy with Per-pixel Chromaticity-luminance distributions of natural scenes, 
(a) Ambiguity in true chromaticity given observed color: each set of points corresponds to the 
possible true chromaticity values (location in see legend) consistent with the pixel’s observed 
chromaticity (color of the points) and different candidate illuminants ihi. (b) Distributions over 
different values for true chromaticity of a pixel conditioned on its observed luminance, computed 
as empirical histograms over the training set. Values y are normalized per-image by the median 
luminance value over all pixels, (c) Corresponding distributions learned with end-to-end training to 
maximize accuracy of overall illuminant estimation. 


y{n) = ||v(n)||i/median{||v(n')||i}n/. This very roughly compensates for variations across im¬ 
ages due to exposure settings, illuminant brightness, etc. However, note that since the normalization 
is global, it does not compensate for variations due to shading. 

The central component of our inference method is a function L[x, y] that encodes the belief that a 
pixel with normalized observed luminance y has true chromaticity x. This function is defined over 
a discrete domain by quantizing both chromaticity and luminance values: we clip luminance values 
y to four (i.e., four times the median luminance of the image) and quantize them into twenty equal 
sized bins; and for chromaticity x, we use a much finger quantization with 2^^ equal-sized bins in 
5^ (see supplementary material for details). In this section, we adopt a purely empirical approach 
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and define I/[x, y] as I/[x, y] = log {Ns^,y/ ’ where A^x,?/ is the number of pixels across 

all pixels in a set of images in a training set that have true chromaticity x and observed luminance y. 

We visualize these empirical versions of I/[x, y] for a subset of the luminance quantization levels 
in Fig. 1(b). We find that in general, desaturated chromaticities with similar intensity values in all 
color channels are most common. This is consistent with findings of statistical analysis of natural 
spectra [17], which shows the “DC” component (fiat across wavelength) to be the one with most 
variance. We also note that the concentration of the likelihood mass in these chromaticities increas¬ 
ing for higher values of luminance y. This phenomenon is also predicted by traditional intuitions 
in color science: materials are brightest when they refiect most of the incident light, which typi¬ 
cally occurs when they have a fiat refiectance function with all values of k{X) close to one. Indeed, 
this is what forms the basis of the white-patch retinex method [3]. Amongst saturated colors, we 
find that hues which combine green with either red or blue occur more frequently than primary col¬ 
ors, with pure green and combinations of red and blue being the least common. This is consistent 
with findings that refiectance functions are usually smooth (PCA on pixel spectra in [17] revealed a 
Fourier-like basis). Both saturated green and red-blue combinations would require the refiectance to 
have either a sharp peak or crest, respectively, in the middle of the visible spectrum. 

We now describe a method that exploits the belief function I/[x, y] for illuminant estimation. Given 
the observed color v(n) at a pixel n, we can obtain a distribution {I/[^(v(n), ihi), ^(n)]}i over 
the set of possible true chromaticity values which can also be interpreted as a 

distribution over the corresponding illuminants ihi. We then simply aggregate these distributions 
across all pixels n in the image, and define the global probability of ihi being the scene illuminant 
m aspi = exp(;i)/ where 



(4) 


n 


N is the total number of pixels in the image, and a and (3 are scalar parameters. The final illuminant 
chromaticity estimate m is then computed as 



Note that (4) also incorporates the prior hi over illuminants. We set the parameters a and /3 using 
a grid search, to values that minimize mean illuminant estimation error over the training set. The 
primary computational cost of inference is in computing the values of {/i}. We pre-compute values 
of ^(x, ih) using (3) over the discrete domain of quantized chromaticity values for x and the candi¬ 
date illuminant set M for ih. Therefore, computing each li essentially only requires the addition of 
N numbers from a look-up table. We need to do this for all M = \M. \ illuminants, where summa¬ 
tions for different illuminants can be carried out in parallel. Our implementation takes roughly 0.3 
seconds for a 9 mega-pixel image, on a modem Intel 3.3GHz CPU with 6 cores, and is available at 
http://WWW.ttic.edu/chakrabarti/chromcc/. 

This empirical version of our approach bears some similarity to the Bayesian method of [14] that 
is based on priors for illuminants, and for the likelihood of different true refiectance values being 
present in a scene. However, the key difference is our modeling of true chromaticity conditioned 
on luminance that explicitly makes estimation agnostic to the absolute scale of intensity values. We 
also reason with all pixels, rather than the set of unique colors in the image. 

Experimental Results. Table 1 compares the performance of illuminant estimation with our 
method (see rows labeled “Empirical”) to the current state-of-the-art, using different quantiles of 
angular error across the Gehler-Shi database [14, 15]. Results for other methods are from the survey 
by Li et al. [18]. (See the supplementary material for comparisons to some other recent methods). 

We show results with both three- and ten-fold cross-validation. We find that our errors with three¬ 
fold cross-validation have lower mean, median, and tri-mean values than those of the best performing 
state-of-the-art method from [8], which combines illuminant estimates from twelve different “uni¬ 
tary” color-constancy method (many of which are also listed in Table 1) using support-vector regres¬ 
sion. The improvement in error is larger with respect to the other combination methods [8, 9,10, 1 1], 
as well as those based the statistics of image derivatives [4, 5, 6]. Moreover, since our method has 
more parameters than most previous algorithms (I/[x, y] has 2^^ x 20 « 300k entries), it is likely 
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Table 1: Quantiles of Angular Error for Different Methods on the Gehler-Shi Database [14, 15] 


Method 

Mean 

Median 

Tri-mean 

25%-ile 

75%-ile 

90%-ile 

Bayesian [14] 

6.74° 

5.14° 

5.54° 

2.42° 

9.47° 

14.71° 

Gamut Mapping [20] 

6.00° 

3.98° 

4.52° 

1.71° 

8.42° 

14.74° 

Deriv. Gamut Mapping [4] 

5.96° 

3.83° 

4.32° 

1.68° 

7.95° 

14.72° 

Gray World [2] 

4.77° 

3.63° 

3.92° 

1.81° 

6.63° 

10.59° 

Gray Edge^i’i’®) [5] 

4.19° 

3.28° 

3.54° 

1.87° 

5.72° 

8.60° 

SV-Regression [21] 

4.14° 

3.23° 

3.35° 

1.68° 

5.27° 

8.87° 

Spatio-Spectral [6] 

3.99° 

3.24° 

3.45° 

2.38° 

4.97° 

7.50° 

Scene Geom. Comb. [9] 

4.56° 

3.15° 

3.46° 

1.41° 

6.12° 

10.39° 

Nearest-30% Comb. [10] 

4.26° 

2.95° 

3.19° 

1.49° 

5.39° 

9.67° 

Classifier-based Comb. [11] 

3.83° 

2.75° 

2.93° 

1.34° 

4.89° 

8.19° 

Neural Comb. (EFM) [8] 

3.43° 

2.37° 

2.62° 

1.21° 

4.53° 

6.97° 

SVR-based Comb. [8] 

2.98° 

1.97° 

2.35° 

1.13° 

4.33° 

6.37° 


Proposed 


(3-Fold) Empirical 

End-to-end Trained 

2.89° 

2.56° 

1.89° 

1.67° 

2.15° 

1.89° 

1.15° 

0.91° 

3.68° 

3.30° 

6.24° 

5.56° 

(10-Fold) Empirical 

End-to-end Trained 

2.55° 

2.20° 

1.58° 

1.37° 

1.83° 

1.53° 

0.85° 

0.69° 

3.30° 

2.68° 

5.74° 

4.89° 


to benefit from more training data. We find this to indeed be the case, and observe a considerable 
decrease in error quantiles when we switch to ten-fold cross-validation. 

Figure. 2 shows estimation results with our method for a few sample images. For each image, we 
show the input image (indicating the ground truth color chart being masked out) and the output image 
with colors corrected by the global illuminant estimate. To visualize the quality of contributions from 
individual pixels, we also show a map of angular errors for illuminant estimates from individual 
pixels. These estimates are based on values of li computed by restricting the summation in (4) to 
individual pixels. We find that even these pixel-wise estimates are fairly accurate for a lot of pixels, 
even when it’s true color is saturated (see cart in first row). Also, to evaluate the weight of these 
per-pixel distributions to the global li, we show a map of their variance on a per-pixel basis. As 
expected from Fig. 1(b), we note higher variances in relatively brighter pixels. The image in the last 
row represents one of the poorest estimates across the entire dataset (higher than 90%—ile). Note 
that much of the image is in shadow, and contain only a few distinct (and likely atypical) materials. 

4 Learning L[x, y] End-to-end 

While the empirical approach in the previous section would be optimal if pixel chromaticities in a 
typical image were infact i.i.d., that is clearly not the case. Therefore, in this section we propose an 
alternate approach method to setting the beliefs in L [x, ^], that optimizes for the accuracy of the final 
global illuminant estimate. However, unlike previous color constancy methods that explicitly model 
statistical co-dependencies between pixels—for example, by modeling spatial derivatives [4, 5, 6], 
or learning functions on whole-image histograms [21] —we retain the overall parametric “form” by 
which we compute the illuminant in (4). Therefore, even though I/[x, y] itself is learned through 
knowledge of co-occurence of chromaticities in natural images, estimation of the illuminant during 
inference is still achieved through a simple aggregation of per-pixel distributions. 

Specifically, we set the entries of L[x, y] to minimize a cost function C over a set of training images: 

T 

t=l i 
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Figure 2: Estimation Results on Sample Images. Along with output images corrected with the global 
illuminant estimate from our methods, we also visualize illuminant information extracted at a local 
level. We show a map of the angular error of pixel-wise illuminant estimates (i.e., computed with li 
based on distributions from only a single pixel). We also show a map of the variance Var({/^}^) of 
these beliefs, to gauge the weight of their contributions to the global illuminant estimate. 


where ih^ is the true illuminant chromaticity of the training image, and p\ is computed from the 
observed colors v^(n) using (4). We augment the training data available to us by “re-lighting” each 
image with different illuminants from the training set. We use the original image set and six re-lit 
copies for training, and use a seventh copy for validation. 

We use stochastic gradient descent to minimize (6). We initialize L to empirical values as described 
in the previous section (for convenience, we multiply the empirical values by a, and then set a = 1 
for computing li), and then consider individual images from the training set at each iteration. We 
make multiple passes through the training set, and at each iteration, we randomly sub-sample the 
pixels from each training image. Specihcahy, we only retain 1/128 of the total pixels in the image 
by randomly sub-sampling 16 x 16 patches at a time. This approach, which can be interpreted as 
being similar to “dropout” [12], prevents over-htting and improves generalization. 
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Derivatives of the cost function with respect to the current values of beliefs I/[x, y] are given by 



i \ n 


i \ n 


(7) 


where = p] (cos ^(ihf ih^) — . 


( 8 ) 


We use momentum to update the values of I/[x, y] at each iteration based on these derivative as 



(9) 


where [x, y] is the previous update value, r is the learning rate, and y is the momentum factor. In 
our experiments, we set y = 0.9, run stochastic gradient descent for 20 epochs with r = 100, and 
another 10 epochs with r = 10. We retain the values of L from each epoch, and our final output is 
the version that yields the lowest mean illuminant estimation error on the validation set. 

We show the belief values learned in this manner in Fig. 1(c). Notice that although they retain the 
overall biases towards desaturated colors and combined green-red and green-blue hues, they are less 
“smooth” than their empirical counterparts in Fig. 1(b) —in many instances, there are sharp changes 
in the values I/[x, y] for small changes in chromaticity. While harder to interpret, we hypothesize 
that these variations result from shifting beliefs of specific (x, y) pairs to their neighbors, when they 
correspond to incorrect choices within the ambiguous set of specific observed colors. 

Experimental Results. We also report errors when using these end-to-end trained versions of the 
belief function L in Table 1 , and find that they lead to an appreciable reduction in error in comparison 
to their empirical counterparts. Indeed, the errors with end-to-end training using three-fold cross- 
validation begin to approach those of the empirical version with ten-fold cross-validation, which 
has access to much more training data. Also note that the most significant improvements (for both 
three- and ten-fold cross-validation) are in “outlier” performance, i.e., in the 75 and 90%-ile error 
values. Color constancy methods perform worst on images that are dominated by a small number of 
materials with ambiguous chromaticity, and our results indicate that end-to-end training increases 
the reliability of our estimation method in these cases. 

We also include results for the end-to-end case for the example images in Figure. 2. For all three 
images, there is an improvement in the global estimation error. More interestingly, we see that the 
per-pixel error and variance maps now have more high-frequency variation, since L now reacts more 
sharply to slight chromaticity changes from pixel to pixel. Moreover, we see that a larger fraction of 
pixels generate fairly accurate estimates by themselves (blue shirt in row 2). There is also a higher 
disparity in belief variance, including within regions that visually look homogeneous in the input, 
indicating that the global estimate is now more heavily influenced by a smaller fraction of pixels. 

5 Conclusion and Future Work 

In this paper, we introduced a new color constancy method that is based on a conditional likelihood 
function for the true chromaticity of a pixel, given its luminance. We proposed two approaches to 
learning this function. The first was based purely on empirical pixel statistics, while the second 
was based on maximizing accuracy of the final illuminant estimate. Both versions were found to 
outperform state-of-the-art color constancy methods, including those that employed more complex 
features and semantic reasoning. While we assumed a single global illuminant in this paper, the 
underlying per-pixel reasoning can likely be extended to the multiple-illuminant case, especially 
since, as we saw in Fig. 2, our method was often able to extract reasonable illuminant estimates 
from individual pixels. Another useful direction for future research is to investigate the benefits of 
using likelihood functions that are conditioned on lightness —estimated using an intrinsic image de¬ 
composition method—instead of normalized luminance. This would factor out the spatially-varying 
scalar ambiguity caused by shading, which could lead to more informative distributions. 
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Supplementary Material 


A: Quantizing the space of valid chromaticity values 

We adopt a standard approach to uniformly quantize unit vectors. Consider a chromaticity unit vec¬ 
tor X = [xi^, xg, x^]^. We parametrize this vector in terms of = x^, and 0 = tan“^(x^/xi^). 
Since the elements of x are constrained to be positive, it follows that u G [0,1] and 0 G [0,7r/2]. 
We uniformly quantize u and 0 in their respective domains, using 64 bins for each when quantiz¬ 
ing illuminant chromaticities to construct M, and 128 bins for true pixel chromaticities to define 
L[x,2/]. 

B: Black-level offset issues with results of Exemplar-based [7] and deep-CNN methods [19] 

In this section, we compare our method to two recent semantic reasoning-based methods— 
exemplar-based [7] and deep-CNN [19]. While the illuminant estimates computed by these methods 
on the Gehler-Shi database were made available by their authors, unfortunately, they are based on 
training and testing with older incorrect versions of the ground-truth and intensity data for images 
from one of the cameras. 

Specifically, the intensities in the image files from the Canon 5D camera in the Gehler-Shi dataset 
includes a “black-level” offset of 129, that needs to be subtracted from the intensities of all pixels 
in all color channels. This offset affects both the observed image data, as well as the ground truth 
computed from the color checker chart. While this was eventually clarified by the authors of [15] 
(and the ground-truth made available by them reflects this correction), there remain older versions 
of the ground truth without this correction, and the estimated illuminants for [7] and [19] made 
available are with respect to these old versions. 

We attempt to compare the performance of our method with that of [7, 19] in two ways. First, we 
generate corrected estimates m* from the illuminant chromaticities m estimated by these methods, 
by “subtracting” the effect of the black level offset. This is done based on the true “un-normalized” 
ground truth illuminants m (i.e., the color of the gray squares in the color chart) as: 

m- = X (|m|i + 129 x 3) - [129,129,129]^. (10) 

|m|i 

Essentially, we rendered the color of the gray square by multiplying the luminance of non-corrected 
ground-truth with the estimated illuminant m, and then subtracted the black level offset. We show 
the errors of these corrected illuminant estimates with respect to the actual ground truth in Table 2, 
and also copy the results of our method from Table 1 . 

Next, we do the reverse. We “add” the offset back to the illuminant estimates from our method: 

m+ = 7 ^ X jmji + [129,129,129]^, (11) 

jmji 

and compare these to the non-corrected ground truth with respect to which the results of [ ] and [19] 
are reported. Table 3 provides this comparison. 

We find that the proposed method outperforms [7] and [19] in both comparisons. However, the 
quantiles reported in both Table 2 and Table 3 for [7, 19] should be interpreted only as a conservative 
estimate of their performance. While the corrections in (10),(11) allowed us to report errors for 
these methods and ours with respect to a common ground truth, it does not correct for the fact that 
estimates for [7, 19] were computed on offset data. In particular, the presence of this offset means 
that the linear relationship (2) between observed colors and the scene illuminant no longer holds. 
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Table 2: Comparison with respect to actual ground-truth, with correction (10) applied to [7, 19]. 


Method 

Mean 

Median 

Tri-mean 

25%-ile 

75%-ile 

90%-ile 

Exemplar-Based [7] 

3.66° 

2.91° 

3.04° 

1.52° 

4.81° 

7.22° 

Deep-CNN [19] 

3.45° 

2.45° 

2.65° 

1.40° 

4.29° 

7.23° 


Proposed 


(3-Fold) Empirical 

2.89° 

1.89° 

2.15° 

1.15° 

3.68° 

6.24° 

End-to-end Trained 

2.56° 

1.67° 

1.89° 

0.91° 

3.30° 

5.56° 

(10-Fold) Empirical 

2.55° 

1.58° 

1.83° 

0.85° 

3.30° 

5.74° 

End-to-end Trained 

2.20° 

1.37° 

1.53° 

0.69° 

2.68° 

4.89° 


Table 3: Comparison with respect to non-corrected ground-truth used in [7, 19], with correction (11) 
applied to our estimates. 


Method 

Mean 

Median 

Tri-mean 

25%-ile 

75%-ile 

90%-ile 

Exemplar-Based [7] 

2.89° 

2.27° 

2.42° 

1.29° 

3.84° 

5.68° 

Deep-CNN [19] 

2.63° 

1.98° 

2.13° 

1.16° 

3.41° 

5.46° 


Proposed 


(3-Fold) Empirical 

2.46° 

1.55° 

1.75° 

0.89° 

3.04° 

5.05° 

End-to-end Trained 

2.21° 

1.39° 

1.54° 

0.71° 

2.65° 

4.79° 

(10-Fold) Empirical 

2.18° 

1.23° 

1.46° 

0.65° 

2.73° 

4.63° 

End-to-end Trained 

1.91° 

1.11° 

1.24° 

0.54° 

2.21° 

4.08° 
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